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Abstract. This paper presents a method for automatic derivation of urban 
structures types with focus on residential areas. It is based on scanned 
topographic maps at scale 1:25k and a given urban block geometry from a 
national topographic database. The procedure consists of five steps: (1) 
definition of a typology of urban structures, (2) extraction of building 
footprints, (3) computation of measurements to describe the urban 
structure (4) building classification and (5) derivation of urban structure 
types on block level. 
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1. Introduction 

Urban geography and urban planning require detailed information about 
the functional, morphological and socio-economic structure of the built 
environment. The building stock is the most important component in 
settlement areas. It directly affects urban structure, e.g. urban form, density 
of housing and distribution of population. Studying the built environment 
as an interdisciplinary research object requires a common spatial working 
basis. One approach is to set up urban cover types (Pauleit and Duhme, 
2000), also known as Urban Structure Types (UST). It is a domain- 
independent concept that describes spatially homogeneous regions in terms 
of the land cover (water, meadows, settlement etc.), land use (residential, 
non-residential etc.) and other physical characteristics (building size, 
density, arrangement etc.). Urban structure types on mid-scale level (e.g. 
statistical block level) can thus be used for interdisciplinary studies, e.g. 



modeling quality of housing, material flows and energy consumption or 
other socio-economic and ecological aspects. 

Despite of the great importance of analyzing urban structure for science, 
planning and politics, there is no nation-wide data base on block level. 
Mapping USTs is still mostly based on visual interpretation of aerial 
photographs and maps and is a very time consuming process. Automatic 
approaches based on remote sensing data can offer an efficient alternative. 
However, such approaches have been tested on selected cities only. They 
also presuppose the availability of radiometrically homogeneous images 
which are very expensive. Topographic maps at scale 1:25,000 are a low- 
cost alternative. They are nationwide available and comprise of very 
homogeneous data. Their temporal coverage reaches back to the very 
beginnings of large-scale topographic mapping. Therefore they allow 
studies on dynamics and developments of settlements over much longer 
periods then satellite imagery. 

2. Related Work 

To explain causes and results of dynamic processes in rapidly growing 
metropolises has driven science to develop effective methods to capture 
land cover and land use structures as well as their change. With the satellite 
remote sensing technique, developed at first for the military reconnaissance 
purposes, it got possible to look at the earth's surface from above and to 
map larger areas of interest. The recording of the natural resources was a 
main aim in the beginnings of the civilian use of these new data. Aerial and 
satellite imagery and the human ability of visual perception allow us to 
recognize and map urban land cover and land use types in an efficient way. 
Within the 1970s years the United States Geological Survey (USGS) 
developed a standardized scheme for the interpretation of data on different 
levels (Anderson et al., 1976). 

First attempts at the automatic derivation of land use classes from remote 
sensing data were kernel based methods (e.g. Wharton, 1982; Barnsley & 
Barr, 1996; Gong & Howarth, 1992). Based on a classified image the spatial 
context of every pixel has been analyzed in terms of composition and the 
spatial arrangement. Barr and Barnsley made an important contribution to 
the transition of the land cover to the land use on the basis of a graph based 
structural attempt (Barnsley & Barr, 1997; Barr & Barnsley, 1997). Their 
approach sets up on discrete land cover regions, which can be achieved with 
image segmentation techniques from remote sensing data. Further a 
structural description of the regions with regard to morphological qualities 
(e.g. area, compactness) and their spatial relations (e.g. adjacency, 



inclusion) is carried out. The authors showed that it is possible to 
distinguish urban land use classes on the basis of morphological qualities 
and neighborhood relations derived in the graph (Barnsley & Barr, 1997, 
Barr et al., 2004). Such studies (see also Herold et al., 2003; Mesev, 2005) 
provide an important contribution to the empirical support of the thesis of 
the relation between the urban form and the function (Batty & Longley, 
1994)- This graph-based approach has been recently used by Walde et al. 
(2013), too. 

During the last few years, some work has been done concerning the 
automatic identification and classification of urban structure types. Many of 
the early approaches use remote sensing imagery, whereas topographic 
vector data and LiDAR data take on greater significance more recently. 
Important studies in this context are in a chronological order Herold, Liu 
and Clarke (2003), Bauer and Steinnocher (2006), Wu, Xu and Wang 
(2006), Dogruso and Aksoy (2007), Banzhaf and Hofer (2008), Meinel et 
al. (2008a, 2009), Wurm et al. (2009), Bochow (2010), Vanderhaegen and 
Canters (2010), Colaninno, Cladera and Pfeffer (2011) and Walde et al. 
(2012). 

3. Urban Structure Types 

A typology of urban structures can be constructed under consideration of a 
typology of urban structures can be constructed under consideration of 
different criteria. According to the classification scheme of Meinel et al. 
(2008), 10 urban structure types can be distinguished (Fig. 1). On the first 
level residential and non-residential buildings are differentiated. 
Residential buildings can be further subdivided into single/two family 
houses and multi-family houses. On the next level the single/two family 
houses can be detached/semidetached (SFH-D), terraced (SFH-T) or rural 
houses (SFH-R). The multi-family houses in turn are classified into open 
(MFH-O) or closed (MFH-C) with respect to the arrangement of buildings 
in a block. Moreover there are traditional row houses (MFH-TR), industrial 
row houses (MFH-IR) and high rise buildings (MFH-HR) having height 
greater than 22m (MFH-HR). Non-residential areas are industrial or 
commercial (IC) and buildings of special purpose (SP). 



Residential — 



Multi-family 
house (MFH) 



Single/Two- 
family House 
(SFH) 



MFH-closed 
development 

MFH-open 
development 

MFH-traditional 
row houses 
MFH-industr. 
apartments 
1 — High rise buildings 

SFH-Detached/ 
semi-detached 

SFH-Terraced 
Houses 
SFH-Rural 
development 



ill 


.•?.<;... 






=/V///c 








V *• 
* ' 



Non-Residential 



Industrial/Commercial 
Usage 

Buildings with special 
purpose (e.g. public, 
education, health) 




Figure 1. Building Typology after Meinel et al. (2009) 



4. Data sources to describe urban structure 

In Germany, there is no official data source that contains detailed 
information on urban structure types. Existing digital landscape models 
only differentiate between a few land use classes such as residential, 
commercial, and industrial. For many applications, this thematic resolution 
is simply too low. Local surveying offices sometimes conduct local urban 
structure mappings. However, these mappings are costly and the obtained 
data are heterogeneous due to different classification criteria. 

For the visual classification of urban structure types various data can be 
used, such as remote sensing imagery (air- and spaceborne), topographic 
maps and plans, and the verification at the site ("ground truthing"). 
Previous approaches to automatic mapping of the urban structure mostly 
use remote sensing data, also known as "urban remote sensing". It is used 
to describe land cover and land use in high spatial and thematic resolution 
to answer scientific questions from the areas of the settlement and town 
geography, urban research, urban morphology, and planning. 



Table 1. Possible data sources for the acquisition of urban structure types. 
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Spatial coverage 
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no 
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Data costs 


moderate 


high 
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low 


Horizontal 
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good 


poor 
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good 


good 


Vertical coverage 


poor 


poor 


moderate 


good 


good 



Table l gives an overview of various data sources and their characteristics. 
One disadvantage of satellite data are their availability only for the last 
decades, while topographic paper maps and plans exist also for "pre-digital" 
times. In the following work, the focus is therefore set to the automated 
derivation of urban structure types from topographic maps. 



5. Workflow 



The approach for an automatic derivation of urban structure types consists 
of several steps, which will be described in the following subsections. Figure 
2 shows the proposed and applied workflow. 




Figure 2. Applied workflow for the derivation of urban structure types. 

In order to make use of the building footprints depicted in scanned 
topographic maps, methods of cartographic pattern recognition and image 
analysis have to be applied. In the first step the building footprints are 
extracted from the binary map image (1). 

The second step is the classification of building footprints according to a 
given typology (2). This process of classification includes feature extraction, 
feature selection, classifier design, model selection and accuracy 
assessment. During the development of a classifier different machine 
learning classifiers such as Support Vector Machines (SVM) and Random 
Forest (RF) will be tested and compared to each other to choose the best. 
SVM has already proved to be applicable in land cover classification (Foody 
& Mathur, 2004; Brenning et al., 2006) and building classification 
(Steiniger et al. 2008). Other algorithms like Bagging and Random Forest 
have not yet been used for building classification. 



In the third step, the classified building types are aggregated to USTs by 
means of the dominating building type within the urban block. Based on a 
given visual UST mapping of the City of Dresden, Germany, an accuracy 
assessment has been carried out. 

5.1. Input Data 

The German topographic map at scale i:25K is a nationwide and multi- 
temporal available low-cost data source for building footprints. In 
advantage over larger scale maps such as i:5K and i:ioK, it provides 
nationwide mostly homogenous representations of the building footprints. 
However, the buildings are presented in the same black map layer as 
transportation and vegetation signatures. 

In addition, urban blocks are used, which are provided by a national 
topographic database called ATKIS® base DLM. On one hand, the geometry 
provides important information during the feature extraction process. On 
the other hand, it serves as an aggregation unit to derive the USTs. The use 
of this database also offers additional information about the land use. Non- 
residential buildings (industrial/ commercial buildings and buildings with 
special purpose) can be easily separated from residential buildings. Since 
the focus is on residential structures, a classifier needs to be trained to 
separate the 8 residential building types only. 

5.2. Cartographic pattern recognition for building footprints 

The above described characteristics of map images require methods of 
cartographic pattern recognition to make the contained information 
explicitly available for spatial analysis and classification. For the exclusion 
of linear map objects, morphological filters are used. After a connected- 
component analysis all candidate objects are classified into a set of building 
and non-building objects following the methodology described in Herold et 
al. (2012). The detection rate for map symbols and signatures is in average 
96%, depending on the map quality and digitalization parameters. The 
building footprint loss due to overlapping map symbols and generalization 
is less than 1%. 



Figure 3. Building footprint retrieval from a topographic map. 

5.3. Features for automatic building classification 

For the classification different measurements (features) are derived by 
means of image processing techniques and spatial analysis. In contrast to 
remote sensing imagery, topographic maps contain discrete map objects 
and no spectral information. Therefore, only geometric, topological or 
contextual features and relations can be used. 

Table 2 shows a selection of the defined measurements. The measurements 
have been derived using HALCON (commercial software for machine 
vision, www.mvtec.com/halcon). For a detailed description of the features 
see Meinel et al. (2008) and Hecht (2013). 



Name 


Notes 


AreaH 


Area of the building object [m 2 ] 


Peri 


Perimeter of the building object [m] 


AreaRect 


Area of the minimum bounding rectangle [m 2 ] 


Convex 


Measurement to describe the convexity, proportion of AreaH and the 
area of the convex hull 


Circular 


Measurement to describe the circularity (compactness) of a building 


PixDist 


Mean distance between the centroid and building contour points [m] 


MaxDiam 


Maximal distance between two building contour points [m] 


RectLeng 


Length of the minimum bounding rectangle [m] 


RectWid 


Width of the minimum bounding rectangle [m] 


HR 


Binary variable is true, if a special signature for high rise buildings was 
detected [o/i] 


NuBuildBl 


Number of buildings in a urban block 


NuBuildBu 


Number of buildings in a 100 m buffer 


AreaBlock 


Area of the urban block [m 2 ] 


NBPBA 


Building density in a urban block [l/m 2 ] 


BLARERAT 


Building coverage rate in a urban block [%] 


MinBlDist 


Minimal distance of the building contour to the urban block boundary 
[m] 







Table 2. Selection of various morphological building features. 



5.4. Classifier Design 

In previous works structure types have been derived through an automatic 
building classification based on a knowledge-based rule set (Meinel et al. 
2009). Due to the high variability of data quality such a rigid classifier may 
fail in applying it to different maps. Pattern Recognition and Machine 
Learning provide a wide spectrum of learning algorithms to analyze data 
and recognize patterns, used for object classification (e.g. Duda et al., 
2000). Since a user-specific building typology is given, a supervised 
learning strategy has been preferred. The training is carried out on building 
level, which makes the availability of sufficient training data needed. 

In a model selection process different machine learning classifiers has been 
tested and compared to each other. For evaluation a 10-fold cross validation 
has been applied, since this is a standard approach to estimate the 
generalization error. After choosing the best classifier, all buildings of the 
City of Dresden, Germany, are classified. Afterwards the USTs are derived 
on urban block level through an aggregation procedure. Finally, based on a 
given UST mapping of the City of Dresden, an accuracy assessment has 
been carried out. 



5.5. Tested classifiers 

During the development of a suitable classifier, different machine learning 
techniques have been tested (Table 3). These are Support Vector Machine 
(SVM) with a Gaussian radial basis function as a kernel as well as ensemble 
based classifiers, like Bagging Trees (BAGGING) and Random Forest (RF). 
In addition classic learning algorithm such as the k-nearest neighbor 
classifier (KNN) and Classification and Regression Trees (CART) have 
been tested for benchmarking. Table 3 briefly summarizes the tested 
methods and its principle. 



Abbreviation 


Algorithm 


Principle 


KNN 


K-nearest 
neighbor classifier 


Assign the class label most frequently represented 
among the k nearest samples in the trainings data. 


CART 


Classification and 
Regression Tree 
(Breiman et al., 
1984) 


Learning algorithm to recursively split the feature 
space into subsets based on binary decisions. The 
result is a decision tree, where leaves represent class 
labels and branches the decisions that lead to those 
class labels. 


BAGGING 


Bagging Trees 
(Breiman et al., 
1996) 


The CART-algorithm is applied on multiple 
bootstrap samples. For classification the results of 
the decision trees are combined by voting. 


RF 


Random Forest 
algorithm 
(Breiman, 2001) 


A Random Forest is an ensemble of decision trees, 
which are constructed based on random samples and 
a random subset of features. For classification the 
results of the single trees are combined by voting. 


SVM 


Support Vector 
Machine (Vapnik, 
2000) 


A SVM is based on nonlinear transformations of the 
features into a higher-dimensional feature space, 
where the classification problem becomes linear 
separable. SVM models are basically binary 
classifiers. With aggregation techniques, these can 
be made applicable to multi-class problems. 



Table 3. Tested classification methods 

All algorithms are implemented in packages for R, which is an open-source 
tool for statistical computing (http://cran.r-project.org). 



5.6. Aggregation 

After choosing the best method for building classification, the whole 
building stock has been classified with a model trained with all available 
training data. Afterwards the defined urban structure types are derived on 
block level. The structure type is determined by the dominant building type 
within each block, which is a common criterion in visual aerial photograph 
interpretation. Thus, for every block exactly one structure type is assigned 
and a consideration of mixed building types was initially renounced. 

Figure 2 shows the principle of automated derivation of the structure type 
on block level. 
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Figure 4. From a topographic map to a thematic urban structure type map. 



6. Results 



The developed approach has been applied to real geospatial data. As an area 
under investigation the city of Dresden, Germany, was chosen. As input 
data, both from year 2006, the digital topographic map DTK25-V and the 
German digital landscape model (ATKIS® base DLM) have been used. For 
training and validation 14,810 buildings each with one of the 8 
corresponding residential building type according to the predefined classes 
have been captured. 

For evaluation of the derived urban structure type an independent UST- 
mapping of the City of Dresden has been provided by the City. In a 
preprocessing, the reference data had to be semantically generalized 
according to the defined structure types in Fig. 1 (8 residential and 2 non- 
residential). 

6.1. Model Selection 

The aim of the model selection is to find the algorithm best suitable for the 
classification problem. Therefore, the generalization ability of the classifiers 
was examined and the model with the best performance has been chosen. 
To measure the generalization ability a 10-fold cross-validation has been 
applied. To save computing time during the model selection, a subset of 
5,000 buildings have been randomly selected for the evaluation. This was 
necessary since the tuning of the parameters, particularly of the SVM, is a 
very time-consuming process. 

Figure 5 summarizes the accuracy for different classifiers. The highest 
classification accuracy could be obtained using a Random Forest (74.3 % 
±2.3). Applying the CART-algorithm (64.8 % ±2.2) or KNN (68.0 % ±2.0) 
results in a lower accuracy. The performance of BAGGING (72.5 % ±2.1) 
and SVM (70.8 % ± 2.1) was moderate. 
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Figure 5. Cross-validated prediction accuracy for building classes. 



6.2. Accuracy Assessment on building level 

After Random Forest has been chosen as the best classification method for 
building classification, a detailed accuracy assessment is carried out now 
based on all available data (n= 14,810). For the evaluation the confusion 
matrix has been computed (cf. Fig. 6). 
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Figure 6. Sum of confusion matrices from 10-fold cross validation. 



Though using all available reference data for the 10-cross validation an 
overall accuracy of 80.5 % (±0.6 %) could be reached. The Producer's 
Accuracy for terraced houses (SFM-R) and rural houses (SFH-R) is quite 
low. The confusion matrix shows issues separating detached multi-family 
houses from single family houses and rural houses. 



6.3. Accuracy assessment on block level 

After aggregation of the information on block level, the results have been 
evaluated with an external UST mapping provided by the City of Dresden. 

Comparing the results with the reference UST mapping an overall accuracy 
of 82.3 % could be observed. Terraced house, rural house structures cannot 
be distinguished adequately. For modeling it is recommended to aggregate 
these classes to one. High rise building structures have been misclassified 
due to the dominance rule during the aggregation procedure. 

UST-specific accuracy on 

block-level Number of classes: 9 




Number of blocks: 71 95 



Figure 6. Result of the comparison to reference-UST from Dresden (n=7i95). 
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Figure 6. Result of the comparison of the automatic derived UST to 
reference mapping from Dresden (n=7,i95). 



7. Conclusion 



Our approach offers an efficient and low cost way to map residential urban 
structures. Additionally, topographic maps provide the basis for multi- 
temporal mapping. Therefore, the presented method could be of particular 
interest for spatial sciences (e.g. studying urban form and dynamics) as well 
as planning (e.g. infrastructure planning, urban and regional planning). 

The comparison of the derived urban structure types with existing 
mappings has shown an accuracy of over 80%. Although the accuracy 
suffices the requirements for land use and land cover mapping (Anderson et 
al. 1976), the producer's accuracy for terraced houses, high rise buildings 
and rural buildings are unsatisfactory. Thus this approach is only suitable 
for supra-regional applications, with no other data options. 

In the future, further evaluations are necessary. Particularly the method 
needs to be applied to other test areas in order to examine the 
transferability. 
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